29 research outputs found

    LU Decomposition on Cell Broadband Engine: An Empirical Study to Exploit Heterogeneous Chip Multiprocessors

    Get PDF
    To meet the needs of high performance computing, the Cell Broadband Engine owns many features that differ from traditional processors, such as the large number of synergistic processor elements, large register files, the ability to hide main-storage latency with concurrent computation and DMA transfers. The exploitation of those features requires the programmer to carefully tailor programs and simutaneously deal with various performance factors, including locality, load balance, communication overhead, and multi-level parallelism. These factors, unfortunately, are dependent on each other; an optimization that enhances one factor may degrade another. This paper presents our experience on optimizing LU decomposition, one of the commonly used algebra kernels in scientific computing, on Cell Broadband Engine. The optimizations exploit task-level, data-level, and communication-level parallelism. We study the effects of different task distribution strategies, prefetch, and software cache, and explore the tradeoff among different performance factors, stressing the interactions between different optimizations. This work offers some insights in the optimizations on heterogenous multi-core processors, including the selection of programming models, considerations in task distribution, and the holistic perspective required in optimizations

    A Measurement of the Proton Structure Function F ⁣2(x,Q2)F_{\!2}(x,Q^2)

    Full text link
    A measurement of the proton structure function F ⁣2(x,Q2)F_{\!2}(x,Q^2) is reported for momentum transfer squared Q2Q^2 between 4.5 GeV2GeV^2 and 1600 GeV2GeV^2 and for Bjorken xx between 1.81041.8\cdot10^{-4} and 0.13 using data collected by the HERA experiment H1 in 1993. It is observed that F ⁣2F_{\!2} increases significantly with decreasing xx, confirming our previous measurement made with one tenth of the data available in this analysis. The Q2Q^2 dependence is approximately logarithmic over the full kinematic range covered. The subsample of deep inelastic events with a large pseudo-rapidity gap in the hadronic energy flow close to the proton remnant is used to measure the "diffractive" contribution to F ⁣2F_{\!2}.Comment: 32 pages, ps, appended as compressed, uuencoded fil

    OPTIMAL SOFTWARE PIPELINING UNDER RESOURCE CONSTRAINTS

    No full text
    corecore